Conversation
d33bs
left a comment
There was a problem hiding this comment.
Nice job! I'm requesting changes mainly because I think there's existing tooling you could make use of to reduce the amount of new code you need to create.
If you feel strongly about keeping things as-is just let me know and I'll circle back to give this more thought.
| # ZedProfiler | ||
|
|
||
| [](#quality-gates) | ||
| [](#quality-gates) |
There was a problem hiding this comment.
How could we validate this coverage? At the moment this seems static, meaning it must be updated each time by some unknown force. If you have a reproducible method which updates this, consider documenting it under contributing.md. Otherwise, consider removing it altogether because part of code coverage is about building trust and reproducibility (which we directly collide with if we supply a measurement without any validation). A side effect of doing this is that it should increase development velocity by making it impossible to avoid an update if coverage changes.
| The package accepts either: | ||
| The package accepts: | ||
| - Single-channel 3D arrays shaped (z, y, x) | ||
| - Multi-channel 4D arrays shaped (c, z, y, x) |
| EXPECTED_SPATIAL_DIMS = 3 | ||
| TWO_DIMENSIONAL = 2 | ||
| FOUR_DIMENSIONAL = 4 | ||
| FIVE_OR_MORE_DIMENSIONS = 5 | ||
| REQUIRED_RETURN_KEYS = ("image_array", "features", "metadata") |
There was a problem hiding this comment.
Consider using Pydantic to help define classes which may be used for validation broadly. You could also likely incorporate Pandera, if DataFrames are to be included somehow, which integrates nicely with Pydantic. Doing this will save you effort from building bespoke data / contract validation tooling.
As you go through this effort you can use concepts from object-orientated programming (OOP) to help you modulate what is being validated and worked on at an atomic level within the software. For instance, what do these attributes pertain to - maybe it's "widget_x"? You can label it as such, using defined objects which are both validated and also passed by type to other portions of the software. You might eventually find that you also want "widget_y" with separate attributes too. Keep separation of concerns in mind and at the same time, avoid over-specific OOP ("super good enough" is often best).
| @@ -1,31 +1,251 @@ | |||
| """Core data contracts shared across featurizers. | |||
There was a problem hiding this comment.
I strongly encourage greater specificity here with concern to "what kind" of contract, now that you're building this out. Specifically, Design by contract, Hoare Triples, and Abstract Data Typing come to mind.
| return True | ||
|
|
||
|
|
||
| def validate_return_schema_contract( |
There was a problem hiding this comment.
I'm unsure if this comes into play here exactly, but this came to mind: Consider using beartype to help validate objects at time of use.
| TEST_DIR = Path(__file__).parent | ||
| sys.path.insert(0, str(TEST_DIR)) |
There was a problem hiding this comment.
This looks like an antipattern which arose from how things are executed for tests. Consider finding a way to remove this code. Sometimes adding an __init__.py file can help, but it depends on what happened that made you add this in (I can't tell from just the code). Pytest can sometimes be a bit cranky about internal imports unless you explicitly tell it that the the test dir is a package with the init file.
| # ============================================================================ | ||
| # FIXTURE: Minimal valid profile | ||
| # ============================================================================ |
There was a problem hiding this comment.
Consider avoiding this bespoke commenting pattern and instead using docstrings inside the fixture definition. Comment applies wherever this pattern popped up.
| # FIXTURE: Small 3D image profile | ||
| # ============================================================================ | ||
| @pytest.fixture | ||
| def small_image_profile() -> TestProfile: |
There was a problem hiding this comment.
Consider using pytest's parametrize to collapse the code needed for repeated iteration over similar data. Comment applies to many tests in this pr.
| @@ -1,4 +1,4 @@ | |||
| """Tests for package export ergonomics.""" | |||
| r"""Tests for package export ergonomics.""" | |||
There was a problem hiding this comment.
The r here looks like a typo. Consider removing.
| "fire>=0.7.1", | ||
| "jinja2>=3.1.6", | ||
| "pandas>=3.0.2", | ||
| "pyarrow>=23.0.1", |
There was a problem hiding this comment.
I couldn't tell exactly, but is pyarrow needed for this PR?
Description
This PR adds data contracts for validating inputs and outputs. This is an initial step and will be added to as other modules get added in the future.
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.